Monte Carlo Based Search for Optimum Metrics of Synthesized Social networks

Daniel O’Neil and Mikel Petty

2018-06-25

Overview

Executing a Monte Carlo based search function involves data files that specify a real world network, parameters for the search function, and a compatibility table. For details about the functions called in this vignette, refer to the GenSynthNetMet manual. Code in this vignette:

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(GenSynthNetMet)

zz <- file("../results/gaMet_logfile.txt", open="a+")

cpat <- as.matrix(read.csv("../data/CustomCompatibilityTable_MBTI.csv"))
mcMetIns <- as.data.frame(read.csv("../data/mcMeanMetInputs.csv",header=FALSE))

index <- 9      # Row number from which to retrieve parameters.

iSampleNum <- 40     # number of networks to generate
 timelimit <- as.numeric(mcMetIns[index,1]) 
 fsociomatrix <- toString(mcMetIns[index,2]) 
 fbestmetrics <- toString(mcMetIns[index,3]) 
 fresults <- toString(mcMetIns[index,4])        
 fsimteam <- toString(mcMetIns[index,5])
 freport <- toString(mcMetIns[index,6])        
 reportTitle <- toString(mcMetIns[index,7])
 frawReport <- toString(mcMetIns[index,8])
 fRNMets <- toString(mcMetIns[index,9])
 fSynthMets <- toString(mcMetIns[index,10])

kickoff_time <- Sys.time()

writeLines(toString(colnames(cpat)),zz)
writeLines(c(paste("sociomatrix: ",fsociomatrix),
             paste("start time: ", kickoff_time), 
             paste("timelimit: ",timelimit, " hours"), 
             paste("report: ", freport)),zz)
             
mcSearch(iSampleNum,fsociomatrix, fbestmetrics, cpat, 
                    fresults,fsimteam, fRNMets, fSynthMets, timelimit)
## [1] "end time:  2018-06-26 06:45:32"
rpt <- as.data.frame(read.csv(fbestmetrics))
write.table(rpt,file=frawReport)
fmtbl <- generateTable(rpt,reportTitle)
writeLines(fmtbl,freport)             # Write the HTML file 

integratedReport <- file("../results/mcIntegratedReport.html", open="a+")
writeLines(fmtbl,integratedReport)
writeLines("<p></p><br><hr>",integratedReport)

completion_time <- Sys.time()
writeLines(c(paste("end time: ", completion_time),"-----------------------"),zz)

fmtbl

Network: Robins Australian Bank
Algorithm: Monte Carlo Mean Metrics Search

Metrics Tx S |Tx-S| R |Tx-R|
Number of nodes 11.000 11.000 0.000 11.000 0.000
Number of links 16.000 16.000 0.000 15.875 0.125
Number of Components 1.000 1.300 0.300 1.400 0.400
Network density 0.291 0.291 0.000 0.289 0.002
Degree average 2.909 2.909 0.000 2.886 0.023
Degree std.dev. 1.868 1.379 0.489 1.354 0.515
Cluster Coefficient 0.375 0.275 0.100 0.249 0.126
Avg. Cluster Coefficient 0.405 0.311 0.094 0.308 0.096
Mean path length 2.018 2.500 0.482 2.728 0.710
Avg.Betweeness 5.091 4.818 0.273 4.707 0.384
Max.Betweeness 25.167 15.947 9.219 15.572 9.595
Avg.Closeness 0.052 0.045 0.006 0.044 0.008
Min.Closeness 0.038 0.030 0.009 0.028 0.011
Avg.Eigencentrality 0.492 0.586 0.094 0.589 0.096
Network radius 2.000 1.800 0.200 1.500 0.500
Avg.Eccentricity 3.091 3.089 0.002 3.109 0.018
Network diameter 4.000 3.825 0.175 3.950 0.050

Discussion

The mcMeanMetInputs.csv file, located in the data directory, identifies the adjacency matrix text files for real world social networks and the path names. The path names point to directories for results and reports. The index variable specifies the row number from which to retrieve parameters from the gaMetInputs file. For information about the adjacency matrix text files in the data director, refer to the GenSynthNetMet manual.

Shell Scripts for Supercomputer

The smallest real world social network was used for demonstrating code in this vignette. To simulate the synthesis of larger social networks, shell scripts were written to excecute the code on a super computer. This work was made possible in part by a grant of high performance computing resources and technical support from the Alabama Supercomputer Authority.

Starting independent jobs on the supercomputer involved two shell scripts. The following shell script writes the number one to a text file and then calls another shell script within a for next loop.

#!/bin/sh

# Reset the counter file
COUNTERFILE="counterfile.txt"
echo 1 > $COUNTERFILE

for i in {1..14}
  do
    echo "Start job number $i. "
    run_script executeMCMet.sh
    sleep 3
  done
exit 0

The second shell script starts a job on the super computer and calls the R script, which was presented in this vignette.

#!/bin/sh

COUNTERFILE="counterfile.txt"
itemNumber=`cat $COUNTERFILE`
# echo "Current item Number is $itemNumber"
#SBATCH -J mcMet_%itemNumber

source /opt/asn/etc/asn-bash-profiles-special/modules.sh
module load R/3.3.3
R CMD BATCH executeMcMeanMet.r

Adding the following lines of R code after the line that opened a log file enable the use of a counter that is incremented by jobs started by the previously presented shell scripts. This code opens the text file containing the current index number and then increments the value and writes the new number back to the text file. Each time this action occurs, the same code is called but with an incremented index. As explained by the comment in the first code chunk of this vignette, the index determines which row from which to retrieve parameters for the search function.

# Open the counter file, read the current index, increment it,
# and store the updated index in the counter file.
index <- as.integer(scan(file="counterfile.txt"))
print(paste("Current index is ",index))

incrementedIndex <- index + 1
write(incrementedIndex, file="counterfile.txt")
writeLines(paste("Incrementing index to ",incrementedIndex),zz)