Our approach We use the moments of the hitting-time distribution to quantify the relationship of each non-seed vertex to the seed-set. The mixture model is fit using a modified expectation- maximization EM algorithm that relies on the moment estimates to approximate vertex-specific hitting-time distributions. The 1 The numerical methods community refers to push algorithms as coordinate descent or relaxation methods.
This is more computationally efficient than sampling from the graph directly. A discussion of the computational complexity of the EM algorithm is beyond the scope of this paper. However, we note that the expected and observed performance of our EM algorithm is linear in the number of vertices. The expected and observed runtimes for the conjugate gradient iteration method we use to obtain hitting-time moments are linear in the number of edges.
In contrast to other seed-set expansion techniques we tested, HITMIX was con- structed to avoid difficult-to-set tuning parameters. We accomplish this by combining the conjugate gradient iteration and EM iterations, both well-known algorithms with easy-to-set parameters.
This is in contrast to the other seed-set expansion algorithms, which often require carefully choosing tuning parameters within an acceptable region, in which settings outside of this region fail to deliver useful results. For example, diffusion-based methods see, e. This decision is important, challenging, and is often dependent of the par- ticulars of the graph in question.
We also remark, that these other methods tested are local, i. However, the conjugate gradient iteration leads to a high-performance implementation. Although the hitting-time moments can be approximated via simulation, the com- putational cost is prohibitive and the simulation-based moments are inherently noisy. Instead, we exploit the fact that a deterministic estimator for the moments has a variational characterization in terms of a convex quadratic functional due to our as- sumption that the graph is connected and undirected.
Hence the global minimum can be determined via the solution of a symmetric positive definite set of linear equations where the coefficient matrix is a submatrix of the graph Laplacian. The remainder of this paper is organized as follows: In Section 2, we develop the hitting-time distribution and computation of the moments. In Section 3 we describe the mixture model used the estimate vertex membership probabilities.
In Section 4 we report the results of applying the proposed approach to the stochastic block model and a real graph used to describe relationships among Wikipedia pages. Finally, we close with a brief summary and some ideas for future work in Section 6. This avoids the prohibitively expensive approach of computing sample moments via simulation over the graph. In contrast to estimating the moments via sampling the graph, the solution of the linear set of equations 4 provides an exact, deterministic estimate of the sample the moments.
Gaussian elimination is a standard method of solution for a linear set of equations. Instead, we propose to use the conjugate gradient iteration [7] as explained 3 E. The recursion we use is adapted from [14]. An important, practical, aspect of the conjugate gradient iteration is that only the application of the coefficient matrix in 5 to a vector is necessary. The number of floating point operations for each of these matrix-vector applications is roughly proportional to the twice the number of edges present.
As explained following 1 , these probability mass functions are not generally available and so must be estimated. Our approach is to use a method of moments estimator for a chosen distribution family. Although our model could be modified to use any distribution and any number of moments, the results presented in this paper use the first two moments mean, variance to fit the lognormal distribution. We found the lognormal distribution works well in practice, and offers computationally fast closed-form estimators.
This suggests that hitting-time distributions should be modeled by distributions with exponentially decaying tails, including but not limited to the lognormal distribution. BIC is a widely-used metric that encourages parsimonious models by rewarding goodness-of-fit, but penalizing a larger number of parameters.
We fit the mixture model for a plausible range of g, and select the model that optimizes the BIC. For each Ti , draw m pseudo-random samples from the fitted distributions. In con- trast to directly sampling hitting-times from the graph, this sampling procedure does not require access to the graph, leading to substantially faster computation.
There are multiple strategies avail- able depending upon the prior information available. The solution of the linear set of equations in step 1 is accomplished by reformulat- ing 4 as a symmetric positive definite system 5 , see Section 2. We then approximate the means and variances by using the conjugate gradient iteration, which requires us to specify a tolerance for terminating the iteration and we initialize the iteration with a random starting vector.
We remark that only this step requires access to the graph. The method of moments MOM estimator in step 2 is accomplished separately for each vertex. We focus on generating data with simple and known structure to facilitate straightforward comparisons between algorithms.
Performing well on such data should not be taken as evidence that an algorithm is generally effective, however, performing poorly on such data is of concern. We then analyze a large real-world dataset that is more representative of problems of interest.
Stochastic Block Model In order to investigate the properties of the proposed method, we conducted a set of Monte Carlo simulation studies using the stochastic block model SBM [8].
The Monte Carlo parameter settings are described in Table 1. In each SBM simulation, two or more blocks were specified, and the first was treated as the goal block. A subset of the vertices in the goal block were randomly sampled and treated as the hitting set. We ran our algorithm, and compared the identified neighbors to the true membership in the goal block using the Adjusted Rand Index ARI; [9] and F1 scores. For the calculation of F1 scores, the goal block was identified as the block with the smallest mean hitting times.
In SBM Simulation 1, the number of blocks was varied from 2— Block size was fixed at , and the probability of in-block edges was fixed at 0. The probability of out-block edges was set to 0.
Blocks b 2,3, Percentiles are taken across Monte Carlo runs. The probability of an out-block edge was fixed at 0. Block size was fixed at The probability of in- and out-block edges were fixed at 0. In order to investigate the relationship between hitting-set size and neighbor detection performance, we varied the size of the hitting-set from 1— The network is an undirected graph comprised of Wikipedia hyperlinks collected on September Vertices represent articles, edges represent hyperlinks between pages, and communities represent article categories.
The graph is a subset of Wikipedia: the data curators first took the largest strongly connected component of the graph, and then restricted to article categories with at least pages. The dataset contains about 1. Article categories are not mutually exclusive, and span a wide range of topics, e. The diameter of the graph the longest shortest path is 9, and the 90th percentile shortest distance between all node pairs is 3. We constructed a goal set i.
Or, we could say, all later times and all previous times. The relevant sense of 'fixes', of course, needs explaining. With this rough picture in place, goes the standard approach, we can turn to the alternative theories. There are four. The first is the causal theory, captured by the maxim 'every event has a cause', or perhaps 'every event has a sufficient cause7.
There are a number of standard objections to the causal theory. One is that it requires an account of causation, which is problematic. As Earman puts it, "it explains a vague concept - determinism - in terms of a truly obscure one - causation" Earman , p. Another objection sometimes raised at this point is that causation is asymmetric with respect to time - causes occur before their effects - whereas determinism is a symmetric notion, as we saw with the case of the box of gas molecules.
A second theory of determinism is Popper's predictability theory. A system is deterministic if and only if all its states are predicable by the right kind of being with knowledge of the present state and the laws of nature. For Popper the "right kind of being", which he sometimes calls the superscientist, is itself part of the universe, with a large but finite capacity for knowledge.
The usual objection to this view is that it confuses epistemology with ontology - predictability being an epistemic notion and determinism being an ontological matter cf. Atmanspacher in this volume , as illustrated by the fact that it makes deterministic chaotic systems indeterministic. A third theory is the mathematical function theory of Russell. On this account the universe is deterministic just if there exists a functional relation relating variables at a given time to variables at all other times.
The system will be 'deterministic throughout the given period' if t, in the above formula, may be any time within that period, though outside that period the formula may be no longer true. If the universe as a whole is such a system, determinism is true of the universe; if not, not.
A problem with this account, that Russell himself notes, is that it holds, trivially, of any possible world. Any possible set of data admits of a function, however complex, of the above form. But determinism is contingent - it is possible that the world is deterministic and it is possible that world is not.
The fourth account utilizes the notion of physical necessity. In the modal- nomic theory of Earman a physically possible world W is deterministic just if for any other physically possible world W, if W and W agree at any time then they agree at all times Earman , p.
The difficulty facing this view is to give an account of laws. For an empiricist like Earman, this means providing an account of the distinction between accidental and nomological What is Determinism? Nevertheless, an account like this would be the front runner for the appropriate theory of determinism.
Having answered the question what is determinism, so the standard pic- ture goes, we can now turn to science to settle whether the world is relevantly deterministic. However, I would argue that the standard picture has already changed the topic. A slide has occurred in the above reasoning. We began asking about the concept of determinism as it appears as the object of people's concerns in relation to free will, and we ended up with an answer more appropriate as an account of what is determinism as it appears in science.
Folk deter- minism is asymmetric whereas scientific concepts based on functions, laws, or predictability are all probably symmetric in time, given our current un- derstanding of scientific laws. It is therefore evidence for this slide that the causal theory was, in the standard picture, taken to be defective on account of its asymmetry.
I will now give three arguments for the conclusion that the folk concept of determinism is asymmetric. Retrodiction of our actions or decisions does not concern us in the way that determinism does. This does not raise any concerns in relation to my free will in deciding to go for a drink. It just means that my choice was predicted by its effects, not that my choice was determined.
It follows, firstly, that retrodiction is not a case of determinism, and that, since retrodiction is prediction, that determinism is not predictability. Secondly, the problem is the asymmetry of determinism. Prediction is sym- metric with respect to time, since the notion of predictability is based on laws of nature which we take to be symmetric in time. But our example il- lustrates that although we do not worry about our actions or decisions being predicted in the past, the same does not apply from the future.
Hence we have reason to think that the folk notion of determinism is asymmetric with respect to time. The point is that the mere fact that my choice or action leads to something that could not have come about any other way in the circumstances does not mean that my free will has been violated.
Suppose my father sends me money for an air-ticket to Spain. I am poor, such that there is no way in the circumstances that I could have gone to Spain apart from my father's generosity.
I decide to go. His sending the money is a necessary but not sufficient cause of my going to Spain. Then there will be no physically possible world in which I go to Spain identical to our world at the time of my going to Spain, in which I am not sent money by my father.
It follows from the appropriate refinement of the modal-nomic theory that my father's sending the money is determined. But no folk worrier would thereby be concerned about my father's free will in sending me the money or his choice to do so, I claim. Therefore the modal-nomic theory of determinism is false. Again, the problem is asymmetry. The laws behind the modal-nomic theory are sym- metric in time, the folk notion of determinism is not.
An objection here might go as follows. The only way the child's life could be saved was if I dove into the river, which I did, saving the child. I say, I felt compelled, I had to do it, I had no choice. Given this, we might conclude that cases where my action or decision is a necessary cause of an outcome do in and of themselves raise exactly the kind of worry that folk have in connection with free will, and that my previous argument is wrong.
However, I think this objection fails. The compulsion here is not from the future. Rather it arises from my own past mental states, anticipating the future. Seeing that there was no other way the child could be saved and not wanting that to happen, I was compelled by my own past thoughts, desires and character to do what I did. Whether this is free will or not I leave aside, but it is not a worry that arises on account of the mere fact that my action is a necessary cause. However, it is conceivable that there is such a case.
Suppose I take seriously the claim that my decision to go drinking tonight is caused by the action of a voodoo doctor next Sunday. More discussion will be given below about the significance of such a possibility for my thesis of the asym- metry of determinism.
Suppose I am also told that that same decision is caused by my genes and upbringing. Then I have a prima facie worry about overdetermination. How can something be determined by a future event and by a past event?
Che Sara Sar Deterministic Theories Classical Worlds 2. Newtonian Space-time. Newtonian Particle Mechanics. Determinism at Bay. Determinism at Sea Life Rafts. Infinite Billiards Heat 11, Walling Out Old Heat Shock ing Waves Viscous fluids Domains of Dependence 3. Tachyons Defeasibility and Degrees of Lawfulness.. Tooley's Case 11, Nomic Necessity Laws as Contingent Relations Among Universals Determinism and Effective Computability: First Try 47 48, SI 52 55 56 58 38 60 61 63 66 2 15 n 80 81 85 87 88 1 93 94 96 Extended Computability.
Generalized Computability. Objections; Church's Thesis Revisited. Mill, Russell, Feigl and Nagel: Vindicated? Mill, Russell, Feigl, and Nagel: Refuted? Randomness, Disorder, and Computational Com- plexity 3. Biased Coins 4. Utter Chaos 5. Determinism and Performance Randomness 6.
0コメント