• 一些有空间特性的数据结构

      1 comment

    1.1.2 EXCELL [Tamminen 1982]( Extendible CELL)

    the EXCELL method decomposes the universe regularly: all grid cells are of equal size.( In order to maintain this property in the presence of insertions, each new split results in the halving of all cells and therefore in the doubling of the directory size.)

    1.1.3 The Two-Level Grid File [Hinrichs 1985]

    The basic idea of the two level grid file is to use a second grid file to manage the grid directory.( The first of the two levels is called the root directory,which is a coarsened version of the second level, the actual grid directory.)个人觉得就是一级索引而二级索引的意思,问题是两个level可能会相互影响造成麻烦

    1.1.4 The Twin Grid File [Hutflesz et al. 1988b]

    (目的是提高空间利用率相对于original grid file,to increase space utilization)twin 意味着这两个grid file不是层次结构,两个grid file都要span整个空间,对于两个file ,P和S,如果increase in the number一方大于另一个方的时候才会造成point的转移,(P转移point到S,S overflow了而P可能反过来merge了)

    1.1.5 Multidimensional Linear Hashing

    uses no or only a very small directory(occupy relatively little storage compared to extendible hashing and usually possible to keep all relevant information in main memory) Early proposals failed to support the range queries,

    MOLHPE :a variant of linear hashing called multidimensional order-preserving linear hashing with partial expansions [Kriegel and Seeger 1986] guarantee a modest file growth,缺点是对于nonuniform distributions data,它不能gracefully adapt to them,(question 1)这是由于hashing function的缺点造成的,但是对于uniform distribution,它outperform其他的对手

    quantile hashing [Kriegel and Seeger 1987, 1989].

    利用a-quantiles来解决question1,将不是均匀分布的transformed into uniformly

    distributed values for a,该作者提出了linear order-preserving (PLOP) hashing来适应extended objects

    dynamic z-hashing [Hutflesz et al.1988a]

    它有更好的order preserving properties,uses a space-filling technique called z-ordering(disadvantage 是会generate 一些无用的block,这点与interpolation-based grid file [Ouksel 1985]相似)也不是对于不同的分布有影响

    1.2 Hierarchical Access Methods

    PAM based on binary or multiway tree structure,除了混合结构BANG file 和buddy tree,其他都不需要地址计算,而是通过buckets来组织数据,overlapping regions和partial coverage这两种技术可以用来提高SAM的性能

    1.2.1 The k-d-B-Tree [Robinson 1981]

    Balanced,没有最小的空间开销可以确保,leaf node存储 data points located in the responding partition,类似于adaptive k-d-tree和B-tree

    1.2.2 LSD-Tree [Henrich et al.1989]( Local Split Decision)

    adapts well to data that are nonuniformly distributed and that it is therefore well-suited for use in connection with the transformation technique;组织结构如adaptive k-d-tree,整个universe被partition到各种不同大小的disjoint cell.用特别的paging算法来保证preserve external balancing property;

    用两种策略来accommodate skewed data,

    two split strategies :

    ·data-dependent(SP1) local decision,equal number of objects on both sizes of the splits

    ·distribution-dependent (SP2):split is done at fixed dimension and position,assumed an underlying distribution

    SP = aSP1 +(1-a) SP2 . a 是由一个经验获得的因素,it vary as deleted and inserted operation.

    提高storage utilization的策略就是redistribute data among buckets.improve search performance for nonpoint data and range queries,auxiliary informantion on the existing data regions along with the index entries.

    1.2.3 The Buddy Tree [Seeger and Kriegel 1990].

    dynamic hashing scheme with structured directory: The tree is constructed by consecutive insertion,cutting the universe recursively into two parts of equal size with iso-oriented hyperplanes

    properties:

    (1) each directory node contains at least two entries;(not be balanced,leaves maybe not in the same level)

    (2) whenever a node n is split, the MBBs Id(ni) and Id(nj) of the two resulting subnodes ni and nj are recomputed to reflect the current situation;and(achieve a high selectivity at the directory level,)

    (3) except for the root of the directory, there is exactly one pointer referring to each directory page.

    Property1,3保证了linear的growth of the directory,use k-d-tree to avoid deadlock of grid file,Only restrict is the number of buddies.

    1.2.4 The BANG File [Freeston 1987]

    (Balanced And Nested Grid) difference from grid file : bucket regions may intersect form nonrectangular bucket regions. nested interpolation- based grid file close to the BANG tree,the major difference is the organized directory.BANG 用 spanning split 来取得 high storage utilization这样可能会造成 整目录的traversal in depth-first manner,Freeston [1989a]采用了不同的spitting strategys ,对于extended objects,用centrorid 来确定重心。Freeston [1989b]

    1.2.5 The hB-Tree [Lomet and Salzberg 1989, 1990]

    (holey brick tree)和k-d-B tree 的相似点是两者都采用了k-d-trees来用内部节点来组织空间,两者最显著的区别是 node spitting是基于multiple attributes ,与BANG file相似,它也是fractal structure (a hole brick) with an external enclosing region and several cavities called extracted regions(萃取空间).这种结构避免了cascadding of split(瀑布式的).严格意思上说,非tree结构而是而是有向无圈图directed acyclic graphs

    Searching is similar to the k-d-B-tree.Insertiong is carried out analogously to the k-d—B-tree.

  • 遗传算法小记

      0 comments

    英文名字是genetic algorithm,注意和genetic programming的区别(参考于http://www.geneticprogramming.com/Tutorial/上面的说法,主要的区别是两者的表示)

    应用范围,NP-complete问题(也就是除了穷举法以外没有其他办法的解决方案的问题,例如排课表,求函数最大值,求路径,囚徒困境(prisoner dilemma)这算是是一种探索性方法),就我的理解,是一种对穷举法的优化。其实用贪婪算法也是可以做到的。这个应该算是一种解决策略,就像数学归纳法一样。

    先谈下理论依据,昨天和师兄探讨了下,这个东西似乎没有什么研究价值,因为缺乏必要的数学依据,也就是严格的证明,而是通过一种类似于公认事实的理论来论证的:遗传学。

    先取一个种族(population)也就是初始化的过程,然后基于这个初始化的种族,我们进行评估(evaluate也就是计算它的fitness),接着进行排序,然后是很重要的一步:select(类似于自然选择),同时对劣势的population进行crossover和mutate操作,这样就有一个新的种族。下面举两个例子。(可以参考这两个链接:1.http://www.studa.net/pc-Theory/080505/1633205.html 2.http://www.cnblogs.com/waterflier/archive/2005/05/11/153545.html

    课表问题(可参考Soolmaz Massoodian, Afsaneh Esteki  在2009年2月写的A Hybrid Genetic Algorithm for Curriculum Based Course Timetabling A Hybrid Genetic Algorithm for CurriculumBased Course Timetabling)

    1.先对问题进行描述,对约束条件进行分类,分成hard constraint(即必须要满足的条件)和soft constraint(尽量要满足的条件),对这些条件进行加权,显然hard constraint是要大大高于soft constraint的权值的。对于排课表这样子的问题,可以采用二维数组来实现,这样也可以避免出现冲突的困境。可以参见Soolmaz Massoodian的这篇paper。

    2.然后将数据normalize(按照步骤1所示)

    3.对照约束表,进行fitness的评估,sort

    4.对不好的population进行mutate和crossover操作。

    5.反复循环2-4过程直到满足终止条件

    提供的pseudo code如下:

    01 //Reads all the date from file
    02 ReadData();
    03 //Assigns integer values to the lectures
    04 EncodeData();
    05 //Sorts the courses to place lectures of more constrained courses in the
    06 timetable first
    07 sortCourses(CourseList);
    08 //Creates a semi-random population of timetables
    09 Initialize();
    10 for(All timetables)
    11 //Calculates the fitnessHard value
    12 fitnessHARD(timetable);
    13 generations=1;
    14 do {
    15 //SELECTION POOL
    16 //Selects the better half of the population for the first half of the new
    17 population
    18 SelectBetterHalf();
    19 //Selects the other half of the new population using tournament-3
    20 SelectOtherHalf();
    21 //CROSSOVER
    22 //Crossover is not performed on the better half of population
    23 for(i=population_size/2;i<population_size;i++)
    24 {
    25 Choose a random number a;
    26 if(a<=crossover rate)
    27 //parent1 will be replaced by the offspring of parent1 and parent2
    28 Crossover(parent1,parent2,parent1);
    29 If (a feasible timetable has not found yet)
    30 // Calculate only the violations of hard constraints for timetables;
    31 fitnessHARD(i);
    32 else
    33 If(Maximum and average fitness of the population are very close)
    34
    35 //To avoid convergence
    36
    37 Increase mutation rate;
    38
    39 //sorts the chromosome pool to prevent performing mutation on a good
    40
    41 timetable created by crossover
    42
    43 sortpool(pool);
    44
    45 //MUTATION
    46
    47 //Mutation is not performed on good chromosomes
    48
    49 for(i=population_size/4;i<population_size;i++)
    50
    51 {
    52
    53 Choose a random number a;
    54
    55 if(a<=mutation_rate)
    56
    57 mutation(i);
    58
    59 If (a feasible timetable has not found yet)
    60
    61 fitnessHARD(i);
    62
    63 else
    64
    65 fitness(i);
    66
    67 }
    68
    69 If(Max fitness of this generation>Max fitness of the previous generation)
    70
    71 If(A feasible timetable has not found yet)
    72
    73 Local search is applied to best chromosome;
    74
    75 else
    76
    77 Local search is applied to the columns of the best chromosome;
    78
    79 } while(!( (time is overOR (a timetable for which all soft and hard constraints
    80
    81 are satisfied is found)))

    计算最大值问题 计算诸如f(x) = (x-1)(x-2) 在区间[1,6]上的最小值

    1.先将[1,6]划分为若干块,比如我划分为200,那么就要穷举这500个点,求最大值。decode,也就是将其转换成染色体编码,500小于512,故而需要9位

    2.建立一个individuals的样本池(pool),也就是initial过程。

    3.对前面的pool进行评估,可以采用 评估函数:E(x) = 100 – (x-1)(x-2),因为求最小值,所以是减去,求得它的累加和,对每个样本求其份额。接着用赌轮法进行select

    下面是 http://www.cnblogs.com/waterflier/archive/2005/05/11/153545.html 上提到的一个简单的赌轮的例子

    13%               35%                    15%                 37%
    ———-|—————————-|————|-*————————-|
    个体1              个体2                  个体3    ^0.67    个体4

    随机数为0.67落在了个体4的端内.本次选择了个体4.

    4. 选择父代和母代进行mutation,然后还有crossover

    5.依然是重复该过程。

    最后附上我写的java代码,因为一直怀疑可行性,实际写代码进行下来效果也不是很好。以后再继续深入了解

    显示breed类,用于繁殖后代:

    1 public class Breed {
    2 Chromosome father;
    3 Chromosome mother;
    4
    5 public Breed() {
    6 father = new Chromosome();
    7 mother = new Chromosome();
    8 }
    9 }

    接着是Chromosome类,染色体类

    1 public class Chromosome {
    2 String content;
    3 double p;
    4 double f;
    5 }

    最后是整个过程:

    Java语言Codee#9551
    001 package test;
    002
    003 import java.util.ArrayList;
    004 import java.util.Iterator;
    005
    006 public class Main {
    007
    008 static int geneNum = 32;
    009 static int individualSize = 16;
    010 static int parentNum = 8;
    011
    012 /**
    013 * @param args
    014 */
    015 public static void main(String[] args{
    016 ArrayList<Chromosome> set = new ArrayList<Chromosome>();
    017 ArrayList<Chromosome> parents = new ArrayList<Chromosome>();
    018
    019 // initial
    020 for (int i = 0i < individualSizei++) {
    021 Chromosome temp = new Chromosome();
    022 temp.content = normalize((int) ((geneNum - 1* Math.random()));
    023 set.add(temp);
    024 }
    025
    026 for (int i = 0i < 10000000i++) {
    027 // Evaluate
    028 double totalV = 0;
    029 for (Chromosome chromosome : set{
    030 int x = Integer.valueOf(chromosome.content, 2);
    031 double f = 800.0 - ((doublex - 5.0* ((doublex - 5.0);
    032 chromosome.f = f;
    033 totalV += f;
    034 }
    035 for (Chromosome chromosome : set{
    036 chromosome.= chromosome.f / totalV;
    037 }
    038
    039 // select
    040 for (int j = 0j < parentNumj++) {
    041 parents.add(select(set));
    042 }
    043
    044 set.clear();
    045
    046 // crossover 8个交叉
    047 crossover(set, parents);
    048
    049 // mutate 8个变异
    050 mutate(set, parents);
    051
    052 parents.clear();
    053 }
    054
    055 for (Iterator<Chromosome> iterator = set.iterator(); iterator.hasNext();) {
    056 Chromosome chromosome = (Chromosomeiterator.next();
    057 System.out.println(chromosome.content);
    058 }
    059 System.out.println();
    060
    061 }
    062
    063 // crossover
    064 private static void crossover(ArrayList<Chromosome> set,
    065 ArrayList<Chromosome> parents{
    066 ArrayList<Breed> breed = new ArrayList<Breed>();
    067 // 配对
    068 for (Iterator<Chromosome> iterator = parents.iterator(); iterator.hasNext();) {
    069 Chromosome father = (Chromosome)iterator.next();
    070 Chromosome mother = null;
    071 ifiterator.hasNext()){
    072 mother = (Chromosome)iterator.next();
    073 }else{
    074 break;
    075 }
    076 Breed breedElementBreed = new Breed();
    077 breedElementBreed.father = father;
    078 breedElementBreed.mother = mother;
    079 breed.add(breedElementBreed);
    080 }
    081
    082 // crossover
    083 for (Iterator<Breed> iterator = breed.iterator(); iterator.hasNext();) {
    084 int bit = (int)(4*Math.random());
    085 if(bit == 4){
    086 bit–;
    087 }
    088 if(bit == 0){
    089 bit++;
    090 }
    091 Breed breed2 = (Breediterator.next();
    092 Chromosome childI = new Chromosome();
    093 Chromosome childII = new Chromosome();
    094 childI.content = breed2.father.content.substring(0,bit+1+ breed2.mother.content.substring(bit+1);
    095 childII.content = breed2.mother.content.substring(0,bit+1+ breed2.father.content.substring(bit+1);
    096 set.add(childI);
    097 set.add(childII);
    098 }
    099 breed.clear();
    100 }
    101
    102 // mutate
    103 private static void mutate(ArrayList<Chromosome> set,
    104 ArrayList<Chromosome> parents{
    105 for (Chromosome chromosome : parents{
    106 int bit = (int) (4 * Math.random());
    107 String content = chromosome.content;
    108 if (bit == 0{
    109 Chromosome c = new Chromosome();
    110 c.content = mutateBit(content.charAt(0)) + content.substring(1);
    111 set.add(c);
    112 } else if (bit == 4{
    113 Chromosome c = new Chromosome();
    114 c.content = content.substring(0, 4)
    115 + mutateBit(content.charAt(4));
    116 set.add(c);
    117 } else {
    118 Chromosome c = new Chromosome();
    119 c.content = content.substring(0, bit)
    120 + mutateBit(content.charAt(bit))
    121 + content.substring(bit + 1);
    122 set.add(c);
    123 }
    124 }
    125 }
    126
    127 private static char mutateBit(char pre{
    128 if (pre == ‘0′{
    129 return ‘1′;
    130 } else if (pre == ‘1′{
    131 return ‘0′;
    132 } else {
    133 throw new RuntimeException();
    134 }
    135 }
    136
    137 // 圆盘算法
    138 private static Chromosome select(ArrayList<Chromosome> set{
    139 double key = Math.random();
    140 Iterator<Chromosome> iterator = set.iterator();
    141 Chromosome result = null;
    142 double totalP = 0.0;
    143 while (iterator.hasNext()) {
    144 result = (Chromosomeiterator.next();
    145 totalP += result.p;
    146 if (totalP > key{
    147 break;
    148 }
    149 }
    150 // System.out.println(totalP);
    151 //         System.out.println((result!=null));
    152 return result;
    153 }
    154
    155 private static String normalize(int x{
    156 if ((x < 0&& (x > geneNum))
    157 return null;
    158 String result = Integer.toBinaryString(x);
    159 if (result.length() < 5{
    160 while (result.length() < 5{
    161 result = ‘0′ + result;
    162 }
    163 }
    164 return result;
    165 }
    166
    167 }

    文章比较水,故而推荐两本书最为补偿:

    Springer出版社的 Introduction To Genetic Algorithms 这本讲的例子比较多用matlab和c++都实现了一个例子

    MIT出版社的  An Introduction To Genetic Algorithms 这本讲的比较详细

    p.s. 本文是代码美化是在: http://fayaa.com/code/home/ 上处理的。