Arrays , Strings and Hashing - (Theory)

Comprehensive Guide: Arrays, Strings, and Hashing

ARRAYS

Fundamental Concepts

Definition and Internal Structure

An array is a homogeneous data structure that stores elements of the same type in a linear sequence. In low-level languages like C/C++, arrays are stored in contiguous memory locations, enabling efficient cache utilization and arithmetic-based indexing. However, in Java, arrays are reference types stored in the heap memory, which may not guarantee physical contiguity due to garbage collection and memory fragmentation.

Memory Model Deep Dive

In Java's memory model:

int[] singleArray = new int[1000]; // Single contiguous block
int[][] matrix = new int[100][100]; // 100 separate arrays referenced by one array

Physical vs Logical Organization

While logically arrays appear contiguous, physically in Java:

Static vs Dynamic Arrays

Static Arrays

Dynamic Arrays

Resizing Strategy Deep Dive

Dynamic arrays use growth factors:

Multi-Dimensional Arrays

2D Arrays Implementation

In Java, 2D arrays are implemented as arrays of arrays:

int[][] matrix = new int[rows][cols];
// Equivalent to:
int[][] matrix = new int[rows][];
for(int i = 0; i < rows; i++) {
    matrix[i] = new int[cols];
}

Jagged Arrays

Java supports jagged arrays where each row can have different lengths:

int[][] jaggedArray = new int[3][];
jaggedArray[0] = new int[2];  // First row has 2 elements
jaggedArray[1] = new int[5];  // Second row has 5 elements
jaggedArray[2] = new int[3];  // Third row has 3 elements

N-Dimensional Arrays

For higher dimensions, the reference chain becomes deeper:

Array Operations with Complexity Analysis

Insertion Operations

  1. At End: O(1) if space available, O(n) if resizing needed
  2. At Beginning: O(n) - requires shifting all elements
  3. At Middle: O(n) - requires shifting elements after insertion point
  4. In Sorted Array: O(n) - requires finding position + shifting

Deletion Operations

  1. From End: O(1) - simply decrement size
  2. From Beginning: O(n) - shift all elements left
  3. From Middle: O(n) - shift elements after deletion point
  4. By Value: O(n) - search + deletion

Update Operations

Traversal Patterns

Practical Scenarios for Arrays

Scenario 1: Image Processing

Arrays are fundamental in image processing where images are represented as 2D or 3D arrays:

// RGB image representation
int[][][] image = new int[height][width][3]; // height x width x RGB
// Processing pixels requires efficient random access
for(int y = 0; y < height; y++) {
    for(int x = 0; x < width; x++) {
        // Apply filter to pixel at (x,y)
        int red = image[y][x][0];
        int green = image[y][x][1];
        int blue = image[y][x][2];
    }
}

Scenario 2: Matrix Operations in Scientific Computing

Mathematical computations heavily rely on multi-dimensional arrays:

// Matrix multiplication
double[][] multiply(double[][] A, double[][] B) {
    int n = A.length, m = B[0].length, p = B.length;
    double[][] C = new double[n][m];
    
    for(int i = 0; i < n; i++) {
        for(int j = 0; j < m; j++) {
            for(int k = 0; k < p; k++) {
                C[i][j] += A[i][k] * B[k][j];
            }
        }
    }
    return C;
}

Scenario 3: Game Development - Grid-Based Games

2D arrays naturally represent game boards, maps, or grids:

// Chess board representation
enum Piece { PAWN, ROOK, KNIGHT, BISHOP, QUEEN, KING, EMPTY }
Piece[][] chessBoard = new Piece[8][8];

// Pathfinding in grid maps
boolean[][] obstacles = new boolean[mapHeight][mapWidth];
int[][] distances = new int[mapHeight][mapWidth];

Scenario 4: Time Series Data Analysis

Arrays store sequential data for analysis:

// Stock price analysis
double[] prices = new double[365]; // Daily prices for a year
double[] movingAverage = calculateMovingAverage(prices, windowSize);

// Signal processing
double[] signal = new double[sampleRate * duration];
double[] filteredSignal = applyLowPassFilter(signal);

Scenario 5: Buffer Management in Systems Programming

Arrays serve as buffers for I/O operations:

// Network packet buffer
byte[] buffer = new byte[1024];
int bytesRead = inputStream.read(buffer);

// Circular buffer for real-time data
class CircularBuffer {
    private int[] buffer;
    private int head, tail, size;
    
    public void enqueue(int value) {
        buffer[tail] = value;
        tail = (tail + 1) % buffer.length;
        if(size < buffer.length) size++;
        else head = (head + 1) % buffer.length;
    }
}

STRINGS

Immutability and Memory Implications

Why Immutability?

Java strings are immutable for several critical reasons:

  1. Security: Strings often hold sensitive data (passwords, file paths)
  2. Thread Safety: Multiple threads can safely share string references
  3. String Pool Optimization: Identical strings share memory
  4. Hash Code Caching: Hash codes can be computed once and cached

Memory Model for Strings

String s1 = "Hello";           // Stored in String Pool (Method Area)
String s2 = "Hello";           // Points to same object in pool
String s3 = new String("Hello"); // Creates new object in heap

String Pool Mechanism

The String Pool (part of the Method Area) stores unique string literals:

Mutable String Alternatives

StringBuilder vs StringBuffer

Aspect StringBuilder StringBuffer
Thread Safety Not synchronized Synchronized
Performance Faster Slower due to synchronization
Use Case Single-threaded Multi-threaded
Memory Overhead Lower Higher

Internal Implementation

Both use resizable character arrays:

// Simplified StringBuilder implementation
public class StringBuilder {
    private char[] value;
    private int count;
    
    public StringBuilder append(String str) {
        int len = str.length();
        ensureCapacity(count + len);
        str.getChars(0, len, value, count);
        count += len;
        return this;
    }
    
    private void ensureCapacity(int minimumCapacity) {
        if (minimumCapacity > value.length) {
            int newCapacity = value.length * 2 + 2;
            value = Arrays.copyOf(value, newCapacity);
        }
    }
}

String Operations and Complexity

Concatenation Analysis

// Inefficient - O(n²) overall complexity
String result = "";
for(int i = 0; i < n; i++) {
    result += "a"; // Creates new string each time
}

// Efficient - O(n) overall complexity
StringBuilder sb = new StringBuilder();
for(int i = 0; i < n; i++) {
    sb.append("a"); // Amortized O(1) per operation
}

Substring Operations

In modern Java (7+), substring creates a new character array:

Character Access and Traversal

// Random access - O(1)
char c = str.charAt(index);

// Sequential traversal - O(n)
for(char c : str.toCharArray()) {
    // process character
}

// Enhanced for loop - O(n) but with method call overhead
for(int i = 0; i < str.length(); i++) {
    char c = str.charAt(i);
}

Practical Scenarios for Strings

Scenario 1: Text Processing and Natural Language Processing

String manipulation is crucial in text analysis:

// Text preprocessing for NLP
public String preprocessText(String text) {
    StringBuilder processed = new StringBuilder();
    
    // Convert to lowercase, remove punctuation, normalize whitespace
    for(char c : text.toCharArray()) {
        if(Character.isLetterOrDigit(c)) {
            processed.append(Character.toLowerCase(c));
        } else if(Character.isWhitespace(c) && 
                  processed.length() > 0 && 
                  processed.charAt(processed.length()-1) != ' ') {
            processed.append(' ');
        }
    }
    
    return processed.toString().trim();
}

Scenario 2: Log Processing and Analysis

String operations are essential for parsing log files:

// Log entry parsing
public class LogParser {
    public LogEntry parseLine(String line) {
        // Format: [2023-12-01 10:30:45] ERROR: Database connection failed
        int timestampEnd = line.indexOf(']');
        String timestamp = line.substring(1, timestampEnd);
        
        int levelStart = timestampEnd + 2;
        int levelEnd = line.indexOf(':', levelStart);
        String level = line.substring(levelStart, levelEnd);
        
        String message = line.substring(levelEnd + 2);
        
        return new LogEntry(timestamp, level, message);
    }
}

Scenario 3: Web Development - URL and Query String Processing

String manipulation in web applications:

// URL parameter extraction
public Map<String, String> parseQueryString(String queryString) {
    Map<String, String> params = new HashMap<>();
    
    if(queryString == null || queryString.isEmpty()) {
        return params;
    }
    
    String[] pairs = queryString.split("&");
    for(String pair : pairs) {
        String[] keyValue = pair.split("=", 2);
        if(keyValue.length == 2) {
            String key = URLDecoder.decode(keyValue[0], "UTF-8");
            String value = URLDecoder.decode(keyValue[1], "UTF-8");
            params.put(key, value);
        }
    }
    
    return params;
}

Scenario 4: Configuration File Processing

Parsing configuration files and property files:

// Properties file parser
public class ConfigParser {
    public Properties parseConfig(String configText) {
        Properties props = new Properties();
        StringBuilder currentValue = new StringBuilder();
        String currentKey = null;
        
        String[] lines = configText.split("\n");
        for(String line : lines) {
            line = line.trim();
            
            if(line.isEmpty() || line.startsWith("#")) {
                continue; // Skip empty lines and comments
            }
            
            if(line.contains("=")) {
                if(currentKey != null) {
                    props.setProperty(currentKey, currentValue.toString());
                }
                
                String[] parts = line.split("=", 2);
                currentKey = parts[0].trim();
                currentValue = new StringBuilder(parts[1].trim());
            } else if(currentKey != null) {
                // Continuation line
                currentValue.append(" ").append(line);
            }
        }
        
        if(currentKey != null) {
            props.setProperty(currentKey, currentValue.toString());
        }
        
        return props;
    }
}

Scenario 5: Template Engine Implementation

String replacement and template processing:

// Simple template engine
public class TemplateEngine {
    public String processTemplate(String template, Map<String, String> variables) {
        StringBuilder result = new StringBuilder();
        int pos = 0;
        
        while(pos < template.length()) {
            int start = template.indexOf("${", pos);
            if(start == -1) {
                result.append(template.substring(pos));
                break;
            }
            
            result.append(template.substring(pos, start));
            
            int end = template.indexOf("}", start);
            if(end == -1) {
                throw new IllegalArgumentException("Unclosed variable: " + template.substring(start));
            }
            
            String varName = template.substring(start + 2, end);
            String value = variables.getOrDefault(varName, "${" + varName + "}");
            result.append(value);
            
            pos = end + 1;
        }
        
        return result.toString();
    }
}

HASHING

Hash Function Design and Theory

Mathematical Foundation

A good hash function should exhibit these properties:

  1. Deterministic: Same input always produces same output
  2. Uniform Distribution: Hash values spread evenly across the hash table
  3. Avalanche Effect: Small input changes cause large hash changes
  4. Efficiency: Fast to compute

Polynomial Rolling Hash for Strings

The polynomial rolling hash is fundamental in string hashing:

hash(s) = (s[0] * p^0 + s[1] * p^1 + ... + s[n-1] * p^(n-1)) mod m

Where:

Why Prime Numbers in Hashing?

Prime numbers are chosen because:

  1. Reduced Patterns: Primes break up regular patterns in data
  2. Mathematical Properties: Better distribution due to number theory
  3. Collision Reduction: Prime modulus reduces clustering
  4. Bit Manipulation: 31 allows optimization: 31 * x = (x << 5) - x

Java's String Hash Implementation

public int hashCode() {
    int h = hash; // Check if already computed
    if (h == 0 && value.length > 0) {
        char val[] = value;
        
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h; // Cache the result
    }
    return h;
}

This implementation:

Collision Resolution Strategies

Separate Chaining (Java's Approach)

Each hash table slot contains a bucket (linked list or tree):

// Simplified HashMap bucket structure
class HashMapEntry<K,V> {
    final K key;
    V value;
    HashMapEntry<K,V> next;
    final int hash;
    
    HashMapEntry(int hash, K key, V value, HashMapEntry<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
}

Tree Conversion in Java 8+

When chain length exceeds threshold (8), Java converts to balanced tree:

Open Addressing Alternatives

  1. Linear Probing: Check next slot sequentially
  2. Quadratic Probing: Check slots at quadratic intervals
  3. Double Hashing: Use second hash function for step size

Load Factor and Resizing

Load Factor Significance

Load Factor = Number of Elements / Table Size

Java's default load factor of 0.75 balances:

Resizing Process

When load factor exceeds threshold:

  1. Create new table (typically double size)
  2. Rehash all existing elements
  3. Update table reference
  4. Time complexity: O(n) for rehashing

HashMap vs TreeMap Comparison

HashMap Characteristics

TreeMap Characteristics

Practical Scenarios for Hashing

Scenario 1: Caching and Memoization

Hash maps excel at caching computed results:

// Fibonacci with memoization
public class FibonacciCache {
    private Map<Integer, Long> cache = new HashMap<>();
    
    public long fibonacci(int n) {
        if(n <= 1) return n;
        
        if(cache.containsKey(n)) {
            return cache.get(n);
        }
        
        long result = fibonacci(n-1) + fibonacci(n-2);
        cache.put(n, result);
        return result;
    }
}

// Web request caching
public class WebCache {
    private Map<String, CacheEntry> cache = new HashMap<>();
    private static final long CACHE_DURATION = 5 * 60 * 1000; // 5 minutes
    
    public String getCachedResponse(String url) {
        CacheEntry entry = cache.get(url);
        if(entry != null && System.currentTimeMillis() - entry.timestamp < CACHE_DURATION) {
            return entry.response;
        }
        
        String response = fetchFromServer(url);
        cache.put(url, new CacheEntry(response, System.currentTimeMillis()));
        return response;
    }
}

Scenario 2: Database Indexing and Query Optimization

Hash indexes provide fast lookups:

// In-memory database index
public class DatabaseIndex {
    private Map<Object, Set<Integer>> index = new HashMap<>();
    
    public void addToIndex(Object key, int rowId) {
        index.computeIfAbsent(key, k -> new HashSet<>()).add(rowId);
    }
    
    public Set<Integer> findRows(Object key) {
        return index.getOrDefault(key, Collections.emptySet());
    }
    
    // Compound index for multiple columns
    public class CompoundKey {
        private final Object[] keys;
        
        public CompoundKey(Object... keys) {
            this.keys = keys.clone();
        }
        
        @Override
        public boolean equals(Object obj) {
            if(this == obj) return true;
            if(!(obj instanceof CompoundKey)) return false;
            return Arrays.equals(keys, ((CompoundKey) obj).keys);
        }
        
        @Override
        public int hashCode() {
            return Arrays.hashCode(keys);
        }
    }
}

Scenario 3: Distributed Systems and Consistent Hashing

Hash functions distribute data across multiple servers:

// Consistent hashing for distributed cache
public class ConsistentHashRing {
    private TreeMap<Integer, String> ring = new TreeMap<>();
    private int virtualNodes = 100;
    
    public void addNode(String node) {
        for(int i = 0; i < virtualNodes; i++) {
            String virtualNode = node + ":" + i;
            int hash = hashFunction(virtualNode);
            ring.put(hash, node);
        }
    }
    
    public String getNode(String key) {
        if(ring.isEmpty()) return null;
        
        int hash = hashFunction(key);
        Map.Entry<Integer, String> entry = ring.ceilingEntry(hash);
        if(entry == null) {
            entry = ring.firstEntry(); // Wrap around
        }
        return entry.getValue();
    }
    
    private int hashFunction(String input) {
        return input.hashCode();
    }
}

Scenario 4: Real-time Analytics and Event Processing

Hash maps track metrics and counters:

// Real-time metrics aggregation
public class MetricsAggregator {
    private Map<String, AtomicLong> counters = new ConcurrentHashMap<>();
    private Map<String, DescriptiveStatistics> metrics = new ConcurrentHashMap<>();
    
    public void incrementCounter(String name) {
        counters.computeIfAbsent(name, k -> new AtomicLong()).incrementAndGet();
    }
    
    public void recordMetric(String name, double value) {
        metrics.computeIfAbsent(name, k -> new DescriptiveStatistics()).addValue(value);
    }
    
    public Map<String, Double> getAverages() {
        return metrics.entrySet().stream()
            .collect(Collectors.toMap(
                Map.Entry::getKey,
                e -> e.getValue().getMean()
            ));
    }
}

// Session tracking for web applications
public class SessionManager {
    private Map<String, UserSession> sessions = new ConcurrentHashMap<>();
    private static final long SESSION_TIMEOUT = 30 * 60 * 1000; // 30 minutes
    
    public UserSession getSession(String sessionId) {
        UserSession session = sessions.get(sessionId);
        if(session != null && session.isExpired()) {
            sessions.remove(sessionId);
            return null;
        }
        return session;
    }
    
    public void createSession(String sessionId, String userId) {
        sessions.put(sessionId, new UserSession(userId, System.currentTimeMillis()));
    }
    
    public void cleanupExpiredSessions() {
        long now = System.currentTimeMillis();
        sessions.entrySet().removeIf(entry -> 
            now - entry.getValue().getLastAccess() > SESSION_TIMEOUT
        );
    }
}

Scenario 5: Duplicate Detection and Data Deduplication

Hash sets efficiently identify duplicates:

// File deduplication using content hashing
public class FileDeduplicator {
    private Map<String, String> hashToPath = new HashMap<>();
    private Set<String> processedHashes = new HashSet<>();
    
    public void processFile(String filePath) throws IOException {
        String contentHash = calculateFileHash(filePath);
        
        if(processedHashes.contains(contentHash)) {
            String existingPath = hashToPath.get(contentHash);
            System.out.println("Duplicate found: " + filePath + " matches " + existingPath);
            // Could create hard link or symbolic link here
        } else {
            processedHashes.add(contentHash);
            hashToPath.put(contentHash, filePath);
        }
    }
    
    private String calculateFileHash(String filePath) throws IOException {
        MessageDigest md = MessageDigest.getInstance("SHA-256");
        try(FileInputStream fis = new FileInputStream(filePath)) {
            byte[] buffer = new byte[8192];
            int bytesRead;
            while((bytesRead = fis.read(buffer)) != -1) {
                md.update(buffer, 0, bytesRead);
            }
        }
        return bytesToHex(md.digest());
    }
}

// Email address validation and normalization
public class EmailProcessor {
    private Set<String> processedEmails = new HashSet<>();
    private Map<String, String> normalizedToOriginal = new HashMap<>();
    
    public boolean addEmail(String email) {
        String normalized = normalizeEmail(email);
        
        if(processedEmails.contains(normalized)) {
            return false; // Duplicate
        }
        
        processedEmails.add(normalized);
        normalizedToOriginal.put(normalized, email);
        return true;
    }
    
    private String normalizeEmail(String email) {
        // Convert to lowercase, remove dots from local part for Gmail, etc.
        String[] parts = email.toLowerCase().split("@");
        if(parts.length != 2) {
            throw new IllegalArgumentException("Invalid email format");
        }
        
        String localPart = parts[0];
        String domain = parts[1];
        
        // Gmail-specific normalization
        if(domain.equals("gmail.com") || domain.equals("googlemail.com")) {
            localPart = localPart.replace(".", "");
            int plusIndex = localPart.indexOf('+');
            if(plusIndex != -1) {
                localPart = localPart.substring(0, plusIndex);
            }
            domain = "gmail.com"; // Normalize googlemail to gmail
        }
        
        return localPart + "@" + domain;
    }
}

Performance Comparison Summary

Operation Array String StringBuilder HashMap HashSet TreeMap TreeSet
Insertion (end) O(1) O(n) O(1) amortized O(1) average O(1) average O(log n) O(log n)
Insertion (middle) O(n) O(n) O(n) N/A N/A N/A N/A
Deletion O(n) O(n) O(n) O(1) average O(1) average O(log n) O(log n)
Update O(1) N/A (immutable) O(1) O(1) average N/A O(log n) N/A
Search/Access O(1) random, O(n) search O(1) by index O(1) by index O(1) average O(1) average O(log n) O(log n)
Traversal O(n) O(n) O(n) O(n) O(n) O(n) O(n)
Space Complexity O(n) O(n) O(n) O(n) O(n) O(n) O(n)

Key Takeaways

When to Use Arrays:

When to Use Strings:

When to Use HashMaps/HashSets:

When to Use TreeMaps/TreeSets: