Friday, 20 April 2018

The golden rule

The first point is that I am not a Christian, the second is that I am not a Marxist or really even a socialist. I am a rational humanist but not quite from the same view as Steven Pinker. I am a Liberal, a moderate a fence sitter and someone who moves with the shifts of contingency. I would say most that I am a pragmatist. I take ideas from when and where they are needed to fit the current situation. I understand that complexity makes long term inflexible beliefs dangerous and often counter-productive. I do not believe in straight-jackets of a particular political/social/economic belief system. However there are one or two fundamentals that we should apply.

The golden rule from Christianity is "do as you would be done by". All other religions have equivalent versions of the basic idea that we should all be nice to one another and behave in ways that we would expect others to behave towards us. This is part of the basis on which we construct human society.

There are those that argued and continue to argue that society is not actually fundamental and who use the arguments of evolutionary science to dispute that society is necessary. These people are an not just ignorant they are an abomination. Their arguments have long been refuted by Axelrod and his work on cooperation and the experiments based on the Iterated Prisoner's Dilemma.

You may think that abomination is too strong a word to apply to them but it is not. These are the hawks and defectors in game theory. Why are they so dangerous? They are so dangerous because they not only pursue an egoistical view of the world where only their self-interest is served, they also undermine trust in general and between everyone else. They are a cancer in a social world and only ostracism is a fitting punishment for them. If they remain part of society then they are parasites taking from the majority doves. They are even deceitful enough to try and convince they population that hawks out-number doves but this can never be the case. Tit-for-tat proves this. Defection is the exception and not the rule.

The difficulty is that most of our politicians, media moguls and financiers are these abominations and this has undermined trust in society so severely that we are now staggering from one crisis to another. I agree with Pinker that the world is significantly better now than at any time in the past. Where I disagree is that we are orders of magnitude worse off than where we should be because of these parasites in our midst.

I was reading Marx and Marxism by Gregory Claeys when it became clear to me how we have strayed from the golden rule. It comes from his definition of what a socialist wants.

Socialists seek to reorganize society to satisfy the needs of the majority without the poverty, inequality, competition and waste associated with capitalism. Like many of their Utopian predecessors they imagine ways of belonging to groups and of relating to other people which are more generous, kind and peaceful, and which minimize or abolish exploitation and oppression. They embrace values like friendship, trust, harmony, fraternity, unity, and solidarity, which seems to be waning in modern society, but which might be recaptured or created anew. p28
Who can argue that they think that generosity, kindness and peace are what we want to aim for in the future or that oppression and exploitation are to be avoided? Who will argue that friendship, trust, harmony and solidarity are bad things? If you are against this then you are opposed to the golden rule to do to others as you would have other do to you.

That is what these abominations try to argue and they have sown their seeds of mistrust, deceit, inequality and exploitation for the last 50-60 years as they seek to roll back the post-war settlements. These are the neo-cons where con is the most appropriate word, they are deceivers. Sometimes they try to hide by using the term neo-liberals but they are not liberals. Liberals did not believe in unconstrained capitalism, they believed that there had to be a communal input even if there was to be as much personal freedom as possible. Look at the social housing of Cadbury's and all of the philanthropy of people like Carnegie etc. For them philanthropy was part of the business and not just an add on. Even Ford understood that he needed to pay his workers enough that they could become customers. Compare this to someone like Larry Ellison who is a philanthropist at gun-point.

There is a golden rule and that is that society does exist and it does so because most of us believe in being nice to one another. Society does not triumph over the individual as we still do have individual needs. Instead they are locked together in the same way as waves and particles are locked together in physics. There is a complementary duality between the rights of the individual and the communal benefits that we all derive from society. Those that deny society are unnatural and wrong they are liars.

I find it very hard to reconcile this golden rule with the behaviour of the current politicians in both the US and the UK. Brexit is the child of these abominations and Trump is these abominations personified. Both the UK Conservative party and the US Republican party have allowed these abominations to dominate and until they can cleanse themselves they deserve to have no future part to play in government.

We can have another Utopia when we cut out these cancerous individuals. That does not mean the Leninist and Trotskyist view that violence is the answer. It is simpler than that we just need to ignore them, not vote for them or buy from them. Democracy means that in the end they depend on the community through capitalism as we are their consumers. We are their voters and we create their success. If we deny them this then they will wither and fail we just have to see past their deceptions and false promises.

Wednesday, 13 December 2017

Roger Stone

Read Jon Ronson - The Elephant in the Room

Alex Jones is paranoid.

Stone was introduced to Jones by Reeves - the grandson of the original superman
Stone had a business with Mannafort and they also worked with Lee Atwater.

Stone worked for Savimbi in Angola
Bob Dole
Landmines
Marcos

Stone knew Roger Ailes
Anti-globalist/establishment

Except Stone is part of the establishment - he was Nixon's Counsel.

They have a cult mentality - they will kill the GOP
This is a coup - they do not want to win hearts.

Alex Jones hijacked the Young Turks
Stone and Jones brought up Clinton rape accusations to the debates.

Manages media - creates false flag confrontations

SuperHubs

This is a slightly puzzling book 

p22 the author is partially wrong. Hierarchies do work and Herbert Simon showed why, but this was not because of top down control. They can be non-directed and spontaneously arise.

p25 gatekeepers to the rich and powerful. Is this a good idea?

p27 - why did the author write the book?

  • potentially undermines her credibility.
  • makes people wary in talking to her.
  • obvious that she is a Soros fan.
p32 "Money is mostly created by banks offering loans" regulated by central banks interest rates and asset purchases (Gold etc.) At the minute with QE $17 trillion has been pumped into the markets and created huge asset bubble such as BitCoin. The intention was for the money to be used for investment and to kickstart growth but this has failed. It has remained in the markets and the banks and not been distributed to the wider economy. This is going to result in a very serious and drastic need for realignment. 

Fundamentally commodities are more important than other markets because we cannot live without them. We depend on them for:
  • Shelter
  • Warmth 
  • Food
As Apple share price rises the return per share has fallen because this is pure speculation and not investment. 

p53 power of the central banks is greater than the politicians. Brexit proves this wrong. You can get a populist vote in ignorance of how the central banks work and this can create a suicidal economic policy.


Creating Research Objects

There is a serious problem with scientific fraud and the reproducibility problem. We need to think about ways in which we can check the integrity of a study.

http://www.researchobject.org 

This is also a way of encoding know how.

Metadata is too time consuming to create at the minute. It needs to be built into the planning and research process itself (GitHub?)

Want to create a knowledge exchange report
Open Research Data - Manyika
Rules for growth - Stodden

Data management plans are required by research councils

  • Integrity checking
  • Hashing
  • What are open file formats?

Blockchain for the trust layer
  • Politicians - regulations and policies
  • Qualifications
  • Medical Records
  • Passports
  • Forensics
  • Risk assessment and rating
Restoring trust is essential
(Byzantine General's Problem)

Money is a trust system representing work done
Reputation is also a trust system but this is only as strong as the weakest link.

Thinking about the bootstrap


  1. Bootstrap samples experimental units but in phylogenetics you sample the VARIABLES - sites.
  2. How should we treat sites?
    1. Remove totall variant?
    2. Remove sites where a row is missing?
  3. You cannot say that parametric and non-parametric are the same thing. They are correlated but not directly comparable.
    1. Carry out FastTree with H5N8, then H5 then N8
    2. Use the parametric and non-parametric bootstraps
    3. Use the CONSEL measures as well.
  4. Having more bootstraps than 100 makes NO difference to the bootstrap values. They converge quickly empirically.
    1. This is far below the theoretical numbers needed by Efron says that this is usual.
    2. Suggests that sites are linked and so there is less independent variability than it appears.
    3. Need to experiment with conserved sites.
    4. Need to experiment with the substitution models to look at sensitivity and also gamma.
  5. There is a lack of independence between sites in the evolutionary models but this is IGNORED in the bootstrap calculations. You should bootstrap codons and not individual bases.

Need to create synthetic data where the true tree is known. This can be used to test:
  1. Effects of sampling by censoring the data.
  2. Evaluate modeltest.
  3. Check trees from bad evolutionary models against the best models (probably the same!!!)

The process of learning

Genetic: Very slow learning and wasteful because it depends on selection. This works between generations.

Taught: Fast learning that sums up what happens in a community.

Exploration: Novel learning by experiences. This is learning through interaction

Distances in psychology.

not transitive
not symmetrical

Tversky 1977 - features of similarity
AI cannot make human decisions until it gets beyond clustering distances.

Undoing Project p 107-114.

Belief in the law of small numbers - The Undoing Project p157-163

Pundtits (illiterate "experts") p 168


Saturday, 18 November 2017

The Virus Gene Papers

I think it unlikely that I will be submitting to Virus Gene again in a hurry. We had written a few papers that we knew would be unpopular and sent them to a meeh level journal where we expected to have an easier ride through peer review. The first hint this wasn't going to be the case was the editor assigned who happened to have collaborated with a group that was in direct competition in H9 phylogenetics lead by Cattoli who I had insulted previously.

Anyway, they are now in PeerJ and public so that nobody else runs off and starts using USEARCH in flu phylogenetics and claiming priority.

https://peerj.com/preprints/3166/
https://peerj.com/preprints/3396/

I just wanted to put the referee's comments for the first paper that was rejected here, because they are laughable and in the context of the referee's comments on the second paper they are probably wrong or at least not consistent. I have put my responses here as with a straight reject I get not response to the editor, who is not going to be on my Christmas card list.

Reviewer #1: General comments
The paper presents the method of classification of H9 lineages using clustering and compares the results with classification based on other methods.
The paper would gain if some practical aspects were added. The title suggests the method is fast, so an approximate time of analysis would be useful, especially that cluster analysis after each run is required and repeated clustering if necessary is suggested.


Specific comments:

Introduction
Explain HMM and SVM abbreviations.

Hidden Markov Models and Support Vector Machines

Materials and methods
Please add the information on the chain length in the BEAST analysis.

2 million

Results
Lines 37-43: "USEARCH identified 19 clusters …" - does it refer to H9 HA? It should be indicated in the text to avoid confusion with subtype identification described above.

19 H9 clusters

Lines 43-44: "The subtype originated in Wisconsin in 1966 and this clade continues to be in circulation" Do the Authors mean that H9N2 subtype was first detected in Wisconsin in 1966?

H9 is first detected in 1966 as part of H9N2

Lines 37-39 (2nd page of results): The sentence "The phylogenetic trees…" is confusing, as only fig. 4 shows tree for clade 12 and it was not divided into subclades.

Easily changed

Lines 50-51 (2nd page of results): Were there 3 or 4 subclades of 14 clade identified?

Easily checked

Discussion
First sentence "The clustering of the influenza viral hemagglutinins using USEARCH proved that clustering can correctly identify the viral subtypes from the sequence data" - the subtype identification was partially correct, as it did not detect H15, and H7 was split into two clusters, so this statement should be revised. It would be interesting to mention with which subtype the H15 sequences were clustered.

I can show that H15 separates out at slightly lower identity. H7 is two groups adjacent so it is correctly identified. It gets 14 out of 15 clusters this is 93% accuracy the method works. 93% is more accuracy than typical for clustering algorithms.

Lines 27-30 (2nd page of discussion): "…small sub-clades of four or less sequence were merged for phylogenetic analysis…" Please explain it in Results.

You cannot make a tree of less than 3 sequences.

Supplementary Figure 3: There are branches labeled with subclade number and some with individual sequence. Please explain it. It is also associated with the comment above.

That would be because labelling a cluster containing one sequence with a cluster name would be stupid. As these clusters were grouped for tree generation it would be misleading to use the cluster number but I can edit them to have both.

Table 3 - missing data in the 5th line

No that does not exist – it is unsupported data in the LABEL paper that is not public and cannot be verified. This was data given to Justin Bahl but not available to anyone else.

Reviewer #2: The automated detection and assignment of IAV genetic data to known lineages and the identification of sequences that don't "fit" existing descriptions is a challenge that requires creative solutions. The authors present a manuscript that proposes a solution to this question and tests it on an extensive H9 IAV dataset. 

Though I find the general question intriguing there are a number of issues. The two major items are: a) as a study on the evolutionary dynamics of H9 IAV, this is missing appropriate context, and the results are not adequately presented or discussed; and b) as a tool to identify known and unknown HA, it generates results that appear to be no different to BLASTn, it isn't packaged so that others may use it in a pipeline/web interface/package, and the generated "clusters" aren't linked to any known biological properties.  I elaborate on a few of these issues below.

1) This is not a novel algorithm: USEARCH has been in use for over 7 years and it has been previously used in IAV diagnostics. Consequently, I would expect the authors to present a novel implementation of the algorithm (e.g., a downloadable package/pipeline, or an interactive interface on the web) or a detailed analysis and discussion of the evolutionary history of the virus in question.  Unfortunately, the authors do not achieve either.

This reviewer is lying you may search for IAV and USEARCH in Google and you will find NOTHING except the two papers I mentioned both of which are more recent. It was first used by Webster in 2015 and for a different approach. It is mostly used for analyzing metagenomics projects. It cannot be packaged because as the paper shows you have to make decisions about the clustering. It is not just automatic you have to analyse the appropriate identity and clustering.

2) The introduction is not adequately structured - after reading, I was left confused as to why dividing the H9 subtype into different genetic clades was necessary, i.e. there is no justification provided for the study. The discussion of clades and lineages is particularly convoluted and given the presented information, it is not clear what the authors are trying to achieve (i.e., they move from identifying subtypes, to identifying clades, to lineages, to reassortment, and all possible combinations). Additionally, there are entire sections of the introduction that consist entirely of unsupported statements (lines 39-48 on alignments and tree inference: lines 52-60 on lineage evolution). This section needs to be revised to provide appropriate context and justification for the study.

The reviewer is obviously completely oblivious as to why you want to carry out lineage analysis in influenza. As such they are not competent to review the paper. As the WHO actually has a working party to create these nomenclatures for H5 this argument is ridiculous.

3) There are many figures associated with BEAST analyses. The goals of these analyses are not introduced, and the trees are not presented or described in any meaningful detail. Further, and more concerning, the presented trees appear to be wrong, i.e. the tip dates are not in accordance with the temporal scale.

That would be because the editor had the number of figures reduced. The BEAST analysis is not particularly important other than to show the consistency of the clustering. If the reviewer bothered to read then they would see that one of the trees does not use tip dates and is a maximum likelihood tree and so dates WILL NOT be consistent with the temporal scale if there is variation in mutation rate along one of the branches. This is actually an interesting point as BEAST FAILS completely to generate a reasonable tree with tip dates for that cluster of data. It produces a network with cycles over a wide range of initial parameters.

4) One of the major results (lines 6-16 in the results) is that the USEARCH algorithm can identify the correct subtype of a sequence, most of the time. How does this compare to BLASTn? And, failing to classify a subtype (line 16) is problematic. The authors should consider what the goal of the analysis is, and then present it along with results from similar algorithms, e.g., with the same data, is BLASTn able to identify subtypes?

I am intrigued by how the reviewer thinks that BLASTn works? To do the same task I would need to identify prototypes of each cluster and then use BLASTn to find the rest of the cluster. I would then need to apply some sort of cut-off in order to identify when BLASTn was finding members of other clusters and not the current cluster. In short this is nonsense. They perform different functions as USEARCH identifies the clusters not just related sequences. USEARCH produces the results in about 1 minute. Just to even set-up the BLAST searches would take 10 times longer than this and to analyse their results and do the correct portioning will take hundreds of times longer. The title of the USEARCH paper is actually “Searching and clustering orders of magnitude FASTER THAN BLAST”

5) I do not understand the significance of USEARCH identifying 19 clusters (line 37); and these data are not linked in anyway to a larger more comprehensive description of the evolutionary dynamics of H9 IAV. The authors should refine their hypothesis, and discuss the results - specifically, if a cluster is identified, what does it mean? What is the significance of the previously unidentified clusters? How closely does this align with phylogenetic methods (and the discussed LABEL)?

Um really this is now getting to be a bad joke. The paper compares to LABEL a method based on totally subjective cluster names created by influenza researchers. The entire discussion is carrying out exactly what this referee is suggesting in this paragraph. Do they need glasses? Are they suffering from a reading problem? Do they have a brain injury? USEARCH produces some of the clusters from LABEL, faster more efficiently and correctly. It is completely objective and based on mathematical criteria. There is no bias dependent on convenience sampling because it uses all the data not just the data a particular lab collects at a particular time. This is a MAJOR step forward in trying to sort out the mess that is influenza nomenclature and shows that most existing attempts are biased, partial and use rules that are not appropriate such as the need for clades to be homogeneous in subtype e.g. only H9N2 and not other H9 containing subtypes. The hypothesis is that existing nomenclatures are bad arbitrary, subjective and not based on mathematical rigour. We have proved this in this paper and in two more analyzing H7 and the internal influenza genes. All show exactly the same point, sound maths, rigorous systematic approaches and excellent biological agreement.

Minor comment:
1) Using my laptop, I aligned all non-redundant H9 HAs (n=5888) in ~2 minutes, and inferred a maximum likelihood phylogeny in ~6 minutes. The argument that phylogenetic methods are slow, particularly given modern tree inference algorithms and implementations on HPCs (e.g. Cipres: http://www.phylo.org) is not accurate. Additionally, alignment issues - particularly within subtypes is a trivial issue. 

Yippy for you referee 2. Now put them into clusters. Just edit that tree with 5888 sequences and see how long it takes. Meanwhile USEARCH will have done it after 1 minute and it will be mathematically correct and not depend on how you cut the trees. Alignments of large numbers of sequences are unreliable. Regardless of this referee stating that this is unsupported this is actually supported by a very large literature and best summed up in the work of Robert Edgar who wrote Muscle and who says DO NOT DO LARGE ALIGNMENTS WITHOUT USING MY USEARCH PROGRAM FIRST. But then it is unlikely that referee 2 actually RTFM for the alignment program. I am sure they ran it without bootstrap and it could not have used tip dates as only BEAST does this.

2) There are a number of methods, e.g, neighbor joining and UPGMA, that use agglomerative clustering methods.

Yes there are well done referee 2 for being a genius and knowing that actually all of phylogenetics is related to clustering. This is the one and only correct statement that they make. All nomenclature and lineage methods depend on agglomerative methods but this is a divisive clustering method which is much less susceptible to convenience sampling. USEARCH is the fastest and best clustering method you can use and it is divisive and not agglomerative.

My comment is that I have NEVER encountered a more partial incompetent and ignorant referee than referee two. I think that they protest too much because they have too much invested in current methods such as LABEL which this paper show to be at best poor and at worst completely wrong.