Genetic Algorithm Search for Optimum Metrics of Synthesized Social Networks

Daniel O’Neil and Mikel Petty

2018-06-25

Overview

Executing the genetic algorithm based search function involves data files that specify a real world network, parameters for the search function, and a compatibility table. For details about the functions called in this vignette, refer to the GenSynthNetMet manual. Code in this vignette:

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(GenSynthNetMet)

zz <- file("../results/gaMet_logfile.txt", open="a+")

cpat <- as.matrix(read.csv("../data/CustomCompatibilityTable_MBTI.csv"))
gaMetIns <- as.data.frame(read.csv("../data/gaMetInputs.csv",header=FALSE))

index <- 9      # Row number from which to retrieve parameters.

 iSampleNum <- 40     # number of networks to generate
 iPopulationSize <- as.numeric(gaMetIns[index,1])
 timelimit <- as.numeric(gaMetIns[index,2])  
 fsociomatrix <- toString(gaMetIns[index,3]) 
 fbestmetrics <- toString(gaMetIns[index,4]) 
 fresults <- toString(gaMetIns[index,5])        
 fsimteam <- toString(gaMetIns[index,6])
 freport <- toString(gaMetIns[index,7])        
 reportTitle <- toString(gaMetIns[index,8])
 frawReport <- toString(gaMetIns[index,9])
 fRNMets <- toString(gaMetIns[index,10])
 fSynthMets <- toString(gaMetIns[index,11])
 
kickoff_time <- Sys.time()

writeLines(toString(colnames(cpat)),zz)
writeLines(c(paste("population: ",iPopulationSize),
             paste("sociomatrix: ",fsociomatrix),
             paste("start time: ", kickoff_time), 
             paste("timelimit: ",timelimit, " hours"), 
             paste("report: ", freport)),zz)

# Print status messages.
gaMetricsSearch(iSampleNum,iPopulationSize,fsociomatrix,fbestmetrics, cpat, 
                           fresults,fsimteam, fRNMets, fSynthMets, timelimit)
## [1] "end time:  2018-06-25 15:36:05"
## [1] "goodness:  0.98125 fitness : FALSE score : 11"
## [1] "goodness:  0.95 fitness : FALSE score : 12"
## [1] "goodness:  1 fitness : TRUE score : 13"
## [1] "goodness:  1 fitness : TRUE score : 14"
## [1] "goodness:  0.99375 fitness : TRUE score : 15"
rpt <- as.data.frame(read.csv(fbestmetrics))
write.table(rpt,file=frawReport)
fmtbl <- generateTable(rpt,reportTitle)
writeLines(fmtbl,freport)             # Write the HTML file 

integratedReport <- file("../results/gaIntegratedReport.html", open="a+")
writeLines(fmtbl,integratedReport)
writeLines("<p></p><br><hr>",integratedReport)

completion_time <- Sys.time()
writeLines(c(paste("end time: ", completion_time),"-----------------------"),zz)

close(zz)
close(integratedReport)

fmtbl

Network: Robins Australian Bank
Algorithm: GA Metrics Search

Metrics Tx S |Tx-S| R |Tx-R|
Number of nodes 11.000 11.000 0.000 11.000 0.000
Number of links 16.000 16.000 0.000 16.200 0.200
Number of Components 1.000 1.200 0.200 1.300 0.300
Network density 0.291 0.291 0.000 0.295 0.004
Degree average 2.909 2.909 0.000 2.945 0.036
Degree std.dev. 1.868 1.371 0.498 1.268 0.600
Cluster Coefficient 0.375 0.273 0.102 0.256 0.119
Avg. Cluster Coefficient 0.405 0.299 0.106 0.284 0.121
Mean path length 2.018 2.356 0.338 2.617 0.599
Avg.Betweeness 5.091 4.986 0.105 4.905 0.186
Max.Betweeness 25.167 17.049 8.118 14.778 10.388
Avg.Closeness 0.052 0.047 0.005 0.046 0.006
Min.Closeness 0.038 0.031 0.007 0.032 0.006
Avg.Eigencentrality 0.492 0.586 0.094 0.584 0.092
Network radius 2.000 2.050 0.050 1.825 0.175
Avg.Eccentricity 3.091 3.198 0.107 3.209 0.118
Network diameter 4.000 4.025 0.025 4.050 0.050

Discussion

The gaMetInputs.csv file, located in the data directory, identifies adjacency matrix text files and path names. The path names point to directories for results and reports. The index variable specifies the row number from which to retrieve parameters from the gaMetInputs file. For information about the adjacency matrix text files in the data director, refer to the GenSynthNetMet manual.

Shell Scripts for Supercomputer

The smallest real world social network was used for demonstrating code in this vignette. To simulate the synthesis of larger social networks, shell scripts were written to excecute the code on a super computer. This work was made possible in part by a grant of high performance computing resources and technical support from the Alabama Supercomputer Authority.

Starting independent jobs on the supercomputer involved two shell scripts. The following shell script writes the number one to a text file and then calls another shell script within a for next loop.

#!/bin/sh

# Reset the counter file
COUNTERFILE="counterfile.txt"
echo 1 > $COUNTERFILE

for i in {1..14}
  do
    echo "Start job number $i. "
    run_script executeGAMet.sh
    sleep 3
  done
exit 0

The second shell script starts a job on the super computer and calls the R script, which was presented in this vignette.

#!/bin/sh

source /opt/asn/etc/asn-bash-profiles-special/modules.sh
module load R/3.3.3
R CMD BATCH executeGAMet.r

Adding the following lines of R code after the line that opened a log file enable the use of a counter that is incremented by jobs started by the previously presented shell scripts. This code opens the text file containing the current index number and then increments the value and writes the new number back to the text file. Each time this action occurs, the same code is called but with an incremented index. As explained by the comment in the first code chunk of this vignette, the index determines which row from which to retrieve parameters for the search function.

# Open the counter file, read the current index, increment it,
# and store the updated index in the counter file.
index <- as.integer(scan(file="counterfile.txt"))
print(paste("Current index is ",index))

incrementedIndex <- index + 1
write(incrementedIndex, file="counterfile.txt")
writeLines(paste("Incrementing index to ",incrementedIndex),zz)