Author Topic: Galaxy Zoo1: Publication Data Release Question  (Read 1783 times)

ctr06001

  • Newbie
  • *
  • Posts: 2
    • View Profile
Galaxy Zoo1: Publication Data Release Question
« on: September 06, 2012, 09:05:20 pm »
I hope this is the correct place to ask this.   I emailed the team, but the website said they are very busy and may never get back to me.  Here is the email I sent them.

Hi guys,

First off, fantastic work!  My name is Chris *** and I'm in a masters computer science program at ***.  We needed to find some numerical data to research datamining techniques on and I came across your site (data.galaxyzoo.org).  However, I seem to have a problem.

In your paper (Galaxy Zoo 1 : Data Release of Morphological Classifications for
nearly 900,000 galaxies) on page 9, you have the following information about your "Table 2" data:

eGalaxies flagged as ‘elliptical’ or ‘spiral’ require 80 percent of the vote in that category after the debiasing procedure has been applied; all other galaxies are flagged ‘uncertain’.

Can you clarify this?  I have your "Table 2" data loaded in a data base and with the following query I can see thousands of records that don't following this rule:

select count(1) from galaxy_zoo_ra_dec_normalized where debiased_cs < 0.8 and flag_s = 1;
--43,087

Many of these records flagged as Spiral actually have debiased/spiral votes going from 0.79 all the way down to 0.095.  It doesn't make sense, unless I don't understand your methodology well enough.  Below are some specific SDSS_IDs in case you have time to reference them.

587742611343081652
587732484374069394
587730774409216499
587727180602343544
587736546310816086

I appreciate any help and insight you can give.  Thanks!

Chris

If anyone here is familiar with this data, some insight would be fantastic!

bamford

  • Admin
  • Jr. Member
  • *****
  • Posts: 65
    • View Profile
Re: Galaxy Zoo1: Publication Data Release Question
« Reply #1 on: September 10, 2012, 08:46:25 am »
Hi Chris,

I've had this question asked a few times, so we obviously didn't do a good enough job of explaining it in the Lintott et al. data release paper.

The provided flags are there for the convenience of those who do not want to get into the details too much, they are based upon a threshold of 0.8.  However, there is a complication in the debiasing.  The classification bias depends on whether one uses the type likelihoods directly, or applies a threshold. The bias is worse if thresholds are used.  I therefore applied bias corrections computed in consistent fashion.  So, for the debiased type likelihoods I computed the bias correction based on the elliptical/spiral ratio using the likelihoods directly; for the type flags I debiased the raw type likelihoods using a correction based on the elliptical/spiral ratio determined after applying a 0.8 threshold, and then applied the same threshold to produce the flags.  Therefore, the type flags do not correspond to simply applying a 0.8 threshold on the debiased type likelihoods, though for many galaxies these will agree.

Further details about the bias correction are in the appendix of the Bamford et al. (2009) paper.

Cheers,
Steven.

ctr06001

  • Newbie
  • *
  • Posts: 2
    • View Profile
Re: Galaxy Zoo1: Publication Data Release Question
« Reply #2 on: September 18, 2012, 02:42:00 pm »
Ah, thank you very much for the reply bamford.  That makes a little more sense now, so with that, I'll go back and reread the original paper and the paper you just mentioned in your post.  I appreciate the help, and love the work you guys are doing with this.  Very interesting.