### April 23, 2014

Someone wrote in: We are about to conduct a voting list experiment. We came across your comment recommending that each item be removed from the list. Would greatly appreciate it if you take a few minutes to spell out your recommendation in a little more detail. In particular: (a) Why are you "uneasy" about list
[…]

On the 23rd January 2014, exactly one day after the 30th anniversary of the Apple Mac, I took delivery of my first ever Apple computer – a late 2013 model MacBook Air. I still heavily use Windows and Linux machines at home and at work but the laptop I cart around with me is now

Today's post is an email interview with Fawn Nguyen, who teaches math at Mesa Union Junior High in southern California. Fawn is on the leadership team for UCSB Mathematics Project that provides professional development for teachers in the Tri-County area. She is a co-founder of the Thousand Oaks Math Teachers' Circle. In an effort to share and learn from other math teachers, Fawn blogs at Finding

E. J. Wagenmakers writes: Remember I briefly talked to you about the subjective assessment of evidence? Together with Richard Morey and myself, Annelies Bartlema created a short questionnaire that can be done online. There are five scenarios and it does not take more than 5 minutes to complete. So far we have collected responses from
[…]

I don't know if this is a sign of times, but this Sunday Morning Insight'entry entitled Why You Should Care About Phase Transitions in Clustering got 35+ Google recommendations. It's a record as far as this blog is concerned. Why a sign of times ? because I think in the same way we see a convergence between sensing and machine learning, there seems to be a similar convergence in themes related to the general topic of advanced matrix factorizations in theoretical computer
[…]

4:44 AM | The Shape of Information

A brief synopsis of my general-audience talk at Haverford College. I'm currently visiting Haverford College at the invitation of +Sorelle Friedler as part of Haverford's big data lecture series. Today's talk was a general audience talk about data mining, titled 'The Shape Of Information': (GDrive link)The Shape Of Information What makes data mining so powerful, and so ubiquitous? How can the same set of techniques identify patients at risk for a rare genetic disorder,
[…]

### April 22, 2014

10:18 PM | A helpful structure for analysing graphs

Mathematicians teaching English “I became a maths teacher so I wouldn’t have to mark essays” “I’m having trouble getting the students to write down their own ideas” “When I give them templates I feel as if it’s spoon-feeding them” These … Continue reading →

Yuejie Chi just sent me the following:Dear Igor, I'm a follower of your blog and appreciate your consistent contribution to the research community..... I am writing to ask if you could kindly advertise our GlobalSIP symposium on "Information Processing for Big Data" on your blog. The symposium contains three tracks - all of them are related to high-dimensional data but with different focuses. Laura Balzano, Yao Xie and I are co-organizing the track on "Subspace Methods for
[…]

8:18 PM | Drexel on Monday 4/28

Looks interesting if you're in the area. I plan to be at the lunch.From: Drexel University's LeBow College of Business <announce@lebow.drexel.edu>Date: Thu, Apr 17, 2014 at 11:14 AMSubject: School of Economics Presents: 2 Presentations by Dr. Preston McAfee, Director, Google Strategic Technologies: 4.28.14 Please join Drexel School of Economics for one or both of the two presentations that will be given by Dr. Preston McAfee, Director, Google Strategic
[…]

6:56 PM | MFM2P - Day 45

Day 45I'm starting to jump around the Estimation 180 site. My students still enjoy it so we keep doing it. Today we did a variety - one song length, one number of carts, one distance between two cities and one number of pennies to form a shape. I realize that they can't use previous estimations to help with the new ones when I do this, but I will go back and do more of each.Back to equations of lines. We worked on slope last week and today we started by writing the equation of a line given its
[…]

3:30 PM | EMaC : Robust Spectral Compressed Sensing via Structured Matrix Completion - implementation -

I love it when there is an implementation, it gets even more interesting when the paper shows phase transitions. Quite simply, the authors send the signal that while theoretical developments are OK, the only way to really compare all these solvers is to show how each of those perform as regards to the acid test of the sharp phase transitions. Here is a new example of that: Robust Spectral Compressed Sensing via Structured Matrix Completion by Yuxin Chen, Yuejie ChiThe paper
[…]

1:01 PM | Ticket to Baaaaarf

A link from the comments here took me to the wonderfully named Barfblog and a report by Don Schaffner on some reporting. First, the background: A university in England issued a press release saying that "Food picked up just a few seconds after being dropped is less likely to contain bacteria than if it is

1:00 PM | What do grades measure?

[I wrote this in the middle of the big SAT thread and I thought I had posted it weeks ago but it appears that I never got around to it. So better late than never...]As discussed before, many of the calls for getting rid of the SAT use the argument that high school grades are a better indicator of college success so we don't need the SAT. There's a modeling fallacy here (also as previously discussed), but putting that aside, the suggestion that we should rely almost entirely on grades as a […]

12:49 PM | ESF workshop Glasgow – 16th-18th June 2014

Thanks to the kind support of the European Science Foundation the University of Glasgow will be hosting a workshop in June focused on reconnection events in classical, quantum and classical fluids. Reconnections are dramatic events, leading to irreversible changes in the topology of a system. In particular they are crucial in understanding magnetic fields in rarefied plasmas, […]

12:20 PM | New Twitter account: UnitFact

I’ve started a new Twitter account @UnitFact for tweets about units of measurement, constants, dimensional analysis, etc.

The organizers of the AdAuctions workshop have extended the submission deadline to April 27. Updated information on the workshop is as follows. The Tenth Ad Auction Workshop (here is the call for papers) Date: June 8, 2014, 8:30-15:30 Submission deadline: April 27, 2014, (23:59 Hawaii time) Organizing Committee: adauctions2014@gmail.com Itai Ashlagi (MIT Sloan) Patrick Jordan […]

This is a guest post written by Stephanie Yang and reposted from her blog. Stephanie and I went to graduate school at Harvard together. She is now a quantitative analyst living in New York City, and will be joining the data science team at Foursquare next month. Last week’s hysterical report by the Daily Show’s Samantha Bee […]

I recently wrote a paper with Afonso Bandeira and Ben Recht, and now it’s on the arXiv (perhaps you heard from Igor). The main idea is dimensionality reduction for classification, but the motivation is different: Compressed sensing allows for data recovery after compressive measurements, but classification should require even less information. Since classification is a many-to-one […]

Justin Haldar just sent me the following:Hi Igor,Wanted to let you know that we've finally put out the public release of the LORAKS code (that you featured on Nuit Blanche back in December). There is a technical report that describes the implementation and walks through all of the examples included with the code available here:http://sipi.usc.edu/reports/abstracts.php?rid=sipi-414The tech report (and the corresponding demo code) build on the original published LORAKS paper, showing several
[…]

### April 21, 2014

9:59 PM | Rotationally symmetric Venn diagrams

No doubt you will have seen a Venn diagram. They are a wonderful way of presenting logical information. For example, they allow us to illustrate the fact that centaurs lie in the union of objects with male torsos and horse legs (Figure 1).Figure 1. Not all male torsos are connected to horse legs and vice versa. However, we see that centaurs do lie in the intersection.Recently there has been an upsurge in using Venn diagrams as way of illustrating jokes, or song titles. My personal favourite
[…]

I just added a new article to Wikipedia on indifference graphs (also known as unit interval graphs or proper interval graphs): the graphs formed from sets of points on the real line by connecting every two points whose distance is less than one.There are many papers on algorithms for going from the graph to a geometric representation in linear time. The following method for the reverse problem, going from the set of points (or equivalently unit intervals) to its graph must be known, possibly in

The Stan Model of the Week showcases research using Stan to push the limits of applied statistics. If you have a model that you would like to submit for a future post then send us an email. Our inaugural post comes from Nathan Sanders, a graduate student finishing up his thesis on astrophysics at Harvard.

I mentioned this paper before but now there is a newer version and most importantly, an implementation.PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares from Partial Observations by Yuejie Chi, Yonina C. Eldar, Robert CalderbankMany real world data sets exhibit an embedding of low-dimensional structure in a high-dimensional manifold. Examples include images, videos and internet traffic data. It is of great significance to reduce the storage requirements and
[…]

3:00 PM | An Un-PAC-learnable Problem

In a previous post we introduced a learning model called Probably Approximately Correct (PAC). We saw an example of a concept class that was easy to learn: intervals on the real line (and more generally, if you did the exercise, axis-aligned rectangles in a fixed dimension). But PAC learning wouldn't be an interesting model if every concept class was PAC-learnable. So as a technical

I’ll be giving a talk at Purdue University on Saturday, May 3 as part of the 65th Midwest Theory Day. If any readers happen to live in West Lafayette, Indiana and are interested in hearing about some of my recent research, you can register for free by April 28 (one week from today). Lunch and snacks are provided, and the […]

Two years ago, Greenpeace put out a report titled “How clean is your cloud,” taking many of the IT giants to task for their lack of commitment to sustainability in their data centers. Now, a few years later, Greenpeace is still at it and has been pushing hard with a mixture of yearly public praise/shaming […]

2:54 PM | Ticket to Baaaath

Ooooooh, I never ever thought I'd have a legitimate excuse to tell this story, and now I do! The story took place many years ago, but first I have to tell you what made me think of it: Rasmus Bååth posted the following comment last month: On airplane tickets a Swedish "å" is written as

1:00 PM | On deck this week

Mon: Ticket to Baaaath Tues: Ticket to Baaaaarf Wed: Thinking of doing a list experiment? Here's a list of reasons why you should think again Thurs: An open site for researchers to post and share papers Fri: Questions about "Too Good to Be True" Sat: Sleazy sock puppet can't stop spamming our discussion of compressed

About a week ago I taught my tutorial, Bayesian Statistics Made Simple, at PyCon 2014 in Montreal. My slides, the video, and all the code, are on this site. The turnout was great. We had a room full of enthusiastic Pythonistas who are now, if I was successful, enthusiastic Bayesians.Toward the end, one of the participants asked a great question based on his work (if I understand the background correctly) at Do Something.org. Here's my paraphrase:"A group of people sign
[…]

Kaggle competitions are potentially pretty cool. Kaggle supplies in-sample data ("training data"), and you build a model and forecast out-of-sample data that they withhold ("test data"). The winner gets a significant prize, often $100,000.00 or more. Kaggle typically runs many such competitions simultaneously.The Kaggle paradigm is clever because it effectively removes the ability for modelers to peek at the test data, which is a key criticism of model-selection procedures that claim to